NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Replicability in Learning: Geometric Partitions and Sperner-KKM Lemma

Vander_Woude, Jason; Dixon, Peter; Pavan, A; Radcliffe, Jamie; Vinodchandran, N V (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track.)

Full Text Available
Total Variation Distance Meets Probabilistic Inference

Bhattacharyya, Arnab; Gayen, Sutanu; Meel, Kuldeep; Myrisiotis, Dimitrious; Pavan, A; Vinodchandran, N V (July 2024, Proceedings of Machine Learning Research)

Full Text Available
On the Feasibility of Forgetting in Data Streams

https://doi.org/10.1145/3651603

Pavan, A; Chakraborty, Sourav; Vinodchandran, N V; Meel, Kuldeep S (May 2024, Proceedings of the ACM on Management of Data)

In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the Right To Be Forgotten for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management. This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as f_a, is incremented to f_a+1. When the update Δ=⊥, occurs, f_ais set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models. This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F₁(sum of the frequencies of elements) of the stream is a non-trivial problem in this model. Based on the initial investigations, we focus on a modified model which we call α-RFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F₀(number of distinct elements) and F₁. We present algorithms and establish almost-matching lower bounds on the space complexity for these computational tasks.
more » « less
Full Text Available
List and Certificate Complexities in Replicable Learning

Dixon, Peter; Pavan, A; Vander_Woude, Jason; Vinodchandran, N_V (December 2023, Curran Associates, Inc.)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
We investigate replicable learning algorithms. Informally a learning algorithm is replicable if the algorithm outputs the same canonical hypothesis over multiple runs with high probability, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called {\em list replicability} and {\em certificate replicability}. Intuitively, these notions capture the degree of (non) replicability. The goal is to design learning algorithms with optimal list and certificate complexities while minimizing the sample complexity. Our contributions are the following. 1. We first study the learning task of estimating the biases of $$d$$ coins, up to an additive error of $$\varepsilon$$, by observing samples. For this task, we design a $(d+1)$-list replicable algorithm. To complement this result, we establish that the list complexity is optimal, i.e there are no learning algorithms with a list size smaller than $d+1$ for this task. We also design learning algorithms with certificate complexity $$\tilde{O}(\log d)$$. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\varepsilon^2})$$ where $$\varepsilon$$ is the approximation error parameter (for a constant error probability). 2. In the PAC model, we show that any hypothesis class that is learnable with $$d$$-nonadaptive statistical queries can be learned via a $(d+1)$-list replicable algorithm and also via a $$\tilde{O}(\log d)$$-certificate replicable algorithm. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\nu^2})$$ where $$\nu$$ is the approximation error of the statistical query. We also show that for the concept class \dtep, the list complexity is exactly $d+1$ with respect to the uniform distribution. To establish our upper bound results we use rounding schemes induced by geometric partitions with certain properties. We use Sperner/KKM Lemma to establish the lower bound results.
more » « less
Full Text Available
Replicability in Learning: Geometric Partitions and Sperner-KKM Lemma

Vander_Woude, Jason; Dixon, Pater; Pavan, A; Radcliffe, Jamie; Vinodchandran, NV (December 2023, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024)

Full Text Available
Brief Announcement: Relations Between Space-Bounded and Adaptive Massively Parallel Computations

https://doi.org/10.4230/LIPIcs.DISC.2023.37

Chen, Michael; Pavan, A; Vinodchandran, N V (October 2023, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Oshman, Rotem (Ed.)
In this work, we study the class of problems solvable by (deterministic) Adaptive Massively Parallel Computations in constant rounds from a computational complexity theory perspective. A language L is in the class AMPC⁰ if, for every ε > 0, there is a deterministic AMPC algorithm running in constant rounds with a polynomial number of processors, where the local memory of each machine s = O(N^ε). We prove that the space-bounded complexity class ReachUL is a proper subclass of AMPC⁰. The complexity class ReachUL lies between the well-known space-bounded complexity classes Deterministic Logspace (DLOG) and Nondeterministic Logspace (NLOG). In contrast, we establish that it is unlikely that PSPACE admits AMPC algorithms, even with polynomially many rounds. We also establish that showing PSPACE is a subclass of nonuniform-AMPC with polynomially many rounds leads to a significant separation result in complexity theory, namely PSPACE is a proper subclass of EXP^{Σ₂^{𝖯}}.
more » « less
Full Text Available
Model Counting Meets F ₀ Estimation

https://doi.org/10.1145/3603496

Pavan, A.; Vinodchandran, N. V.; Bhattacharyya, Arnab; Meel, Kuldeep S. (September 2023, ACM Transactions on Database Systems)

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSP’s and computation of zeroth frequency moments (F₀) for data streams. Our investigations lead us to observe a striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting andF₀computation. We design a recipe for translating algorithms developed forF₀estimation to model counting, resulting in new algorithms for model counting. We also provide a recipe for transforming sampling algorithm over streams to constraint sampling algorithms. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framingF₀estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works. In particular, our view yields an algorithm for multidimensional range efficientF₀estimation with a simpler analysis.
more » « less
Full Text Available
Maximizing Submodular Functions under Submodular Constraints

Padmanabhan, Madhavan R; Zhu, Yanhui; Basu, Samik; Pavan A. (July 2023, Uncertainty in Artificial Intelligence, {UAI} 2023)
Evans, Robin; Shpitser, Ilya (Ed.)
We consider the problem of maximizing submodular functions under submodular constraints by formulating the problem in two ways: \SCSKC and \DiffC. Given two submodular functions $$f$$ and $$g$$ where $$f$$ is monotone, the objective of \SCSKC problem is to find a set $$S$$ of size at most $$k$$ that maximizes $f(S)$ under the constraint that $$g(S)\leq \theta$$, for a given value of $$\theta$$. The problem of \DiffC focuses on finding a set $$S$$ of size at most $$k$$ such that $h(S) = f(S)-g(S)$$ is maximized. It is known that these problems are highly inapproximable and do not admit any constant factor multiplicative approximation algorithms unless NP is easy. Known approximation algorithms involve data-dependent approximation factors that are not efficiently computable. We initiate a study of the design of approximation algorithms where the approximation factors are efficiently computable. For the problem of \SCSKC, we prove that the greedy algorithm produces a solution whose value is at least $$(1-1/e)f(\OPT) - A$, where $$A$$ is the data-dependent additive error. For the \DiffC problem, we design an algorithm that uses the \SCSKC greedy algorithm as a subroutine. This algorithm produces a solution whose value is at least $$(1-1/e)h(\OPT)-B$, where $$B$$ is also a data-dependent additive error. A salient feature of our approach is that the additive error terms can be computed efficiently, thus enabling us to ascertain the quality of the solutions produced.
more » « less
Full Text Available
Constraint Optimization over Semirings

https://doi.org/10.1609/aaai.v37i4.25522

Pavan, A.; Meel, Kuldeep S.; Vinodchandran, N. V.; Bhattacharyya, Arnab (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, min-max or access control semiring, tropical semiring, and fuzzy semiring. The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula phi over n variable and a semiring (K,+, . ,0,1), find the maximum value over all possible interpretations of phi over K. This can be seen as a generalization of the well-known satisfiability problem (a propositional formula is satisfiable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to find an interpretation that achieves the maximum value. In this work, we first focus on these optimization problems over the Viterbi semiring, which we call optConfVal and optConf. We first show that for general propositional formulas in negation normal form, optConfVal and optConf are in FP^NP. We then investigate optConf when the input formula phi is represented in the conjunctive normal form. For CNF formulae, we first derive an upper bound on the value of optConf as a function of the number of maximum satisfiable clauses. In particular, we show that if r is the maximum number of satisfiable clauses in a CNF formula with m clauses, then its optConf value is at most 1/4^(m-r). Building on this we establish that optConf for CNF formulae is hard for the complexity class FP^NP[log]. We also design polynomial-time approximation algorithms and establish an inapproximability for optConfVal. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings.
more » « less
Full Text Available
On Approximating Total Variation Distance

https://doi.org/10.24963/ijcai.2023/387

Bhattacharyya, Arnab; Gayen, Sutanu; Meel, Kuldeep S.; Myrisiotis, Dimitrios; Pavan, A.; Vinodchandran, N. V. (August 2023, International Joint Conference on Artificial Intelligence)

Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
more » « less
Full Text Available

« Prev Next »

Search for: All records